Goto

Collaborating Authors

 action 2


Appendix Contents

Neural Information Processing Systems

Every moral scenario consists of a triple ( context, action 1, action 2) and a set of auxiliary labels. The actions describe two possible actions in the first-person (e.g., The moral scenarios can be categorized into: 1. MoralChoice-LowAmbiguity The LLM-assisted construction (i.e., zero-and few-shot prompting setups) of the scenarios is grounded Category Rule Refined Rule Description Do not harm Do not kill Do not kill (i.e., do not cause permanent loss of consciousness). Do not cause pain Do not cause physical or emotional pain or unpleasant feelings (e.g., anger, sadness) to someone. Do not disable Do not deprive someone of their physical, mental or volitional ability (e.g. Do not deprive of freedom Do not deprive someone of their freedom (i.e., make a person unable to do something by altering the person's environment or situation).





Convergence of No-Swap-Regret Dynamics in Self-Play

Neural Information Processing Systems

Despite growing interest in understanding and predicting the long-term behavior of such systems, recent studies have revealed a wide array of negative results, demonstrating the elusiveness of game dynamics.


Appendix Contents

Neural Information Processing Systems

Every moral scenario consists of a triple ( context, action 1, action 2) and a set of auxiliary labels. The actions describe two possible actions in the first-person (e.g., The moral scenarios can be categorized into: 1. MoralChoice-LowAmbiguity The LLM-assisted construction (i.e., zero-and few-shot prompting setups) of the scenarios is grounded Category Rule Refined Rule Description Do not harm Do not kill Do not kill (i.e., do not cause permanent loss of consciousness). Do not cause pain Do not cause physical or emotional pain or unpleasant feelings (e.g., anger, sadness) to someone. Do not disable Do not deprive someone of their physical, mental or volitional ability (e.g. Do not deprive of freedom Do not deprive someone of their freedom (i.e., make a person unable to do something by altering the person's environment or situation).


Evaluating the Moral Beliefs Encoded in LLMs

Neural Information Processing Systems

This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs.


Deliberate Planning in Language Models with Symbolic Representation

Xiong, Siheng, Liu, Zhangding, Zhou, Jieyu, Su, Yusen

arXiv.org Artificial Intelligence

Planning remains a core challenge for large language models (LLMs), particularly in domains that require coherent multi-step action sequences grounded in external constraints. We introduce SymPlanner, a novel framework that equips LLMs with structured planning capabilities by interfacing them with a symbolic environment that serves as an explicit world model. Rather than relying purely on natural language reasoning, SymPlanner grounds the planning process in a symbolic state space, where a policy model proposes actions and a symbolic environment deterministically executes and verifies their effects. To enhance exploration and improve robustness, we introduce Iterative Correction (IC), which refines previously proposed actions by leveraging feedback from the symbolic environment to eliminate invalid decisions and guide the model toward valid alternatives. Additionally, Contrastive Ranking (CR) enables fine-grained comparison of candidate plans by evaluating them jointly. Conceptually, SymPlanner operationalizes two cognitive faculties: (i) error monitoring and repair via externalized feedback (IC) and (ii) preference formation among alternatives via pairwise comparison (CR), advancing cognitively plausible, symbol-grounded planning aligned with the rich structure in intelligent systems. We evaluate SymPlanner on PlanBench, demonstrating that it produces more coherent, diverse, and verifiable plans than pure natural language baselines.


Efficient Multi-Agent Collaboration with Tool Use for Online Planning in Complex Table Question Answering

Zhou, Wei, Mesgar, Mohsen, Friedrich, Annemarie, Adel, Heike

arXiv.org Artificial Intelligence

Complex table question answering (TQA) aims to answer questions that require complex reasoning, such as multi-step or multi-category reasoning, over data represented in tabular form. Previous approaches demonstrated notable performance by leveraging either closed-source large language models (LLMs) or fine-tuned open-weight LLMs. However, fine-tuning LLMs requires high-quality training data, which is costly to obtain, and utilizing closed-source LLMs poses accessibility challenges and leads to reproducibility issues. In this paper, we propose Multi-Agent Collaboration with Tool use (MACT), a framework that requires neither closed-source models nor fine-tuning. In MACT, a planning agent and a coding agent that also make use of tools collaborate to answer questions. Our experiments on four TQA benchmarks show that MACT outperforms previous SoTA systems on three out of four benchmarks and that it performs comparably to the larger and more expensive closed-source model GPT-4 on two benchmarks, even when using only open-weight models without any fine-tuning. We conduct extensive analyses to prove the effectiveness of MACT's multi-agent collaboration in TQA.


Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

Zhao, Yunfan, Behari, Nikhil, Hughes, Edward, Zhang, Edwin, Nagaraj, Dheeraj, Tuyls, Karl, Taneja, Aparna, Tambe, Milind

arXiv.org Artificial Intelligence

Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial $\lambda$-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.